Bootstrap-based inferential improvements to the simplex nonlinear regression model

In this paper we evaluate the performance of point and interval estimators based on the maximum likelihood(ML) method for the nonlinear simplex regression model. Inferences based on traditional maximum likelihood estimation have good asymptotic properties, but their performance in small samples may not be satisfactory. At out set we consider the maximum likelihood estimation for the parameters of the nonlinear simplex regression model, and so we introduced a bootstrap-based correction for such estimators of this model. We also develop the percentile and bootstrapt confidence intervals for those parameters as competitors to the traditional approximate confidence interval based on the asymptotic normality of the maximum likelihood estimators (MLEs). We then numerically evaluate the performance of these different methods for estimating the simplex regression model. The numerical evidence favors inference based on the bootstrap method, in special the bootstrapt interval, which was decisive in an application to real data.


Introduction
Normal linear regression models are widely used in the most diverse areas of knowledge. Currently, several proposals of regression models for doubly-constrained regression models for doubly-constrained response variables, which assume continuous values in (a, b), where a and b are known and −1 < a < b < 1, thus, such support can be easily transformed to the unit interval.
In this context, where y 2 (0, 1) or (y 2 (a, b)), the normal linear model is inadequate, because besides the possibility of occurrence of fitted values smaller than 0(a) or larger than 1 (b), in general, the data present asymmetry and heteroscedasticity, violating the usual assumptions of such model. Thus, it seems more appropriate to consider models based on distributions naturally supported on (0, 1) as is the case of the simplex regression model proposed by [1], for example.
The simplex distribution was developed from the generalized inverse Gaussian distribution and is part of the class of dispersion models defined by [2], which extend the [3] generalized linear models. Several papers have been conducted using this distribution. For example, [4]  used it to evaluate longitudinal data considering the constant dispersion parameter, using generalized estimating equations. [5] modified this approach with the assumption that the dispersion parameter varies across observations. Based on the dispersion models, Using the Bayesian approach and Monte Carlo simulations, [6] evaluates the estimators of the parameters of the simplex model with variable dispersion.
Other approaches for modeling limited data are the beta [7], Kumaraswamy [8], Johnson S B [9], unit gamma [10] regression models. Recently published papers show possible advantages of using the latter distribution over the beta distribution [11,12]. Recently, [13] proposed to the class of non-linear simplex regression models, in which they estimate the model parameters using the maximum likelihood method and derive the local influence quantities. The authors showed that when data are concentrated at the extremes of the standard unit interval, the maximum likelihood estimation process of the simplex model is more stable than that of the beta regression model. [14] presented the zero-and-one-inflated simplex distribution for modeling proportion data. The authors introduced a new algorithm to compute maximum likelihood estimates of the parameters of the simplex distribution without covariates, and developed likelihood-based inference methods for the regression model using this new distribution.
The study of the behavior of asymptotic maximum likelihood estimators in small samples is an important area of research. These estimators can be biased when the sample size is small or even moderate. The bias is actually a measure of average risk. The average risk in replacing the true value of the parameter with a plausible estimated value. Bias can also be seen as how far the mean of an estimator is far from the true value of the parameter. Thus, it is desirable to obtain estimators with reduced bias in finite samples. When the sample size is large the bias tends to zero. In the literature, there are several ways to obtain less biased estimators in small samples. Here, we shall adopt a bias correction obtained from the bootstrap method [15].
In statistical inference it is of fundamental importance to associate reliability to the point estimates of the model, and one way to do this is through the construction of the interval estimators of the parameters in association with the probability that the estimators contain the true value of these parameters. Confidence intervals can be obtained through the assumption that the asymptotic distribution of the maximum likelihood estimators is the normal probability distribution, which may require large samples to ensure the validity of these approximation. In small samples, an alternative for constructing confidence intervals with good performance with respect to both the coverage rate of the true value of the parameter and the length of the interval is the bootstrap method [15]. Specifically we shall adopt two bootstrap-based confidence intervals, namely: the percentile and bootstrap t . These two schemes typically have empirical coverage rates very close to the nominal ones [16].
Regarding modeling limited continuous data, several authors have already conducted improvements on inference based on the maximum likelihood estimation method. [17] propose both the nonlinear beta regression model and improvements for the maximum likelihood estimators. [18] present corrections to the generalized likelihood ratio statistic (LR) based on [19] for the class of beta regression models whereas [11] used the same strategy considering the unitary gamma distribution. [20] also evaluate the impact of model misspecification on empirical coverage of different prediction intervals, and investigate the impact of model misspecification on three bootstrap prediction intervals. [21] discuss test inference in small samples in the class of beta regression models. The authors consider the LR test and its bootstrap versions, show that the standard LR test tends to be quite liberal in small samples and that bootstrap-based tests provide more reliable inference even when the sample size is very small.
In this document our aim is twofold. At outset we shall developed bootstrap-based inferential improvements for the parameters that index the class of nonlinear simplex regression models proposed by [13]. In which the mean of the response variable and the dispersion parameter are related to covariates by means of nonlinear predictors. In sequence, we shall jointly evaluate the performance of the competing estimators, namely: the MLEs and the bootstrap-based estimators introduced by us.
We evaluate several aspects of interval estimation by Monte Carlo simulations. The bootstrap method proved to be an important tool estimation on nonlinear simplex regression, because through it we can get around several inferential MLEs' problems in finite samples. Finally, we present an application whose data is from the Chemistry department of the National University of Colombia.

Nonlinear simplex regression model
In the literature there are several discrete and continuous distributions that belong to the class of dispersion models, among which we can mention the distributions: normal, inverse normal, gamma, Von Mises, Poisson, Binomial, negative Binomial, and others. In particular, if a random variable y follows the simplex distribution denoted by S − (μ, σ 2 ) with parameters 0 < μ < 1 and σ 2 > 0, the density expression takes the following form: where the deviance component d(y; μ) is given by dðy; mÞ ¼ ðyÀ mÞ 2 yð1À yÞm 2 ð1À mÞ 2 : The variance function for the simplex distribution is expressed as V(μ) = μ 3 (1 − μ) 3 . The mean and variance of this distribution are given, respectively, by EðyÞ ¼ m and VarðyÞ ¼ mð1 À mÞ À ffi ffi ffi ffi ffi where Γ(a, b) corresponds to the incomplete gamma function, defined by Gða; bÞ ¼ R 1 b x aÀ 1 e À x dx. For more details on these properties, see [2]. The simplex distribution is quite flexible for modeling data in the continuous range (0, 1), showing different shapes according to the values of the parameters that index the distribution. For examples, such as the J shape for S − (0.9, 36), the U shape for S − (0.5, 121) and the inverse J shape for S − (0.1, 36), in addition to the common shapes, namely left-symmetric, right-symmetric and symmetric. Also, unlike the beta distribution, the simplex model is very useful for accommodating data with bimodal distributions, example for S − (0. 5,20).
To provide the quantities related to the estimation by the maximum likelihood procedure we shall consider the general case, with nonlinearity in the parameters. Thus, we must emphasize that, f 1 (�) and f 2 (�) are differentiable functions with Jacobian matrices. Based on (1) we have that the logarithm of the likelihood function is given by 'ðb; The components of the score vector (U β (β, γ) > , U γ (β, γ) > ) > are given by U b ðb; gÞ 1 X > SUTðy À mÞ and U g ðb; gÞ ¼Z > Ha; withX ¼ @Z @b andZ ¼ @z @g being derivative matrices of dimension n × k and n × q, respectively, y = (y 1 , . . ., y n ) > , μ = (μ 1 , . . ., μ n ) > and a = (a 1 , . . ., a n ) > are n × 1 matrices and U = diag{u 1 , . . ., u n } is a diagonal matrix in which the t-th component is defined as To obtain the Fisher information matrix for the parameter vectors β and γ, we shall use the following results: E½dðy; mÞ� ¼ s 2 , E½ðy À mÞd 0 ðy; mÞ� ¼ À 2s 2 and E½ðy À mÞdðy; mÞ� ¼ 0 [4]; E½ðy À mÞd 00 ðy; mÞ� ¼ 0, 1 2 E½d 00 ðy; mÞ� ¼ 3s 2 mð1À mÞ þ 1 m 3 ð1À mÞ 3 and Var[d(y; μ)] = 2(σ 2 ) 2 [23] and E½d 0 ðy; mÞ� ¼ 0 [5]. The Fisher information matrix for the parameter vector θ = (β > , γ > ) > socalled here by K(β, γ) is a diagonal matrix with two blocks of submatrices which are K ββ and K γγ defined as follows K bb ¼X > SWX and K gg ¼Z > DZ. Here, W = diag{w 1 , . . ., w n } and D = diag(d 1 , . . ., d n ) with their MLEsb andĝ are asymptotically independent. For large samples and under regularity conditions the approximate distribution of the MLEs is given bŷ To measure the degree of non-constant dispersion, we define l ¼ Note that the greater the λ the further away the simplex regression model with varying dispersion is from the model in which the dispersion is supposed to be fixed, since the constant dispersion models holds that s 2 1 ¼ . . . ¼ s 2 n ¼ s 2 , in either case λ = 1. Furthermore, this λ definition measure actually as the increase of variance response effects the estimation process of the model. To became σ 2 variable it is necessary increases the s 2 max , otherwise the s 2 min should be too small and, do not plausible. Thus, as greater is λ as greater is the response variances, in the real problems. Here during the simulations we control the value of the maximum variance, because exactly what we want is to evaluate the properties of the estimators when the variance does not explode, but only grows slightly. When working with real data, the occurrence of large values of estimated λ is substantial, i.e., λ > 1 (in particular, n when it is large).
We still need to discuss the variances of the responses further. The first part of the expression in (4) implies that the vector ðb > ;ĝ > Þ > is asymptotically unbiased. Thus, as the sample size increases, ðb > ;ĝ > Þ > is approximately unbiased and its bias should be close to zero. In theory this fact is true only when n approaches infinity, that is, asymptotically. In practice the better the approximation in (4), and this depends on the distribution, the faster the bias goes to zero, i.e. this can occur for sample sizes n = 40, 50. . ..
However, this assumption is mostly valid forb > due to its relationship withm which is theoretically unbiased, (exactly and not approximately, typically). On the other hand, the relationship of theĝ > vector is withŝ 2 , which is theoretically biased in most distributions. Thus, it is already expected that theĝ > bias takes a long time to converge to zero and requires large sample sizes for this to occur.
This discussion reveals that we should be more aware of how the corrections act on theĝ > . Note that biasedĝ > shall induce biased response variances. As a consequence, hypothesis tests and confidence intervals should perform poorly and may lead to misleading conclusions about the model, such as the exclusion of important covariates.

Point estimation of the model parameters
The maximum likelihood estimators of β and γ are obtained by Fisher's interactive scoring process in which the initial guess was proposed in [13], and are usually biased when the sample size is small or even moderate, particularlyĝ > . Nevertheless, the estimator's bias can be corrected and one possibility is to use resampling methods, which are schemes that use repeated sampling within the same sample to calculate estimates. The bootstrap method is one of the most widely used resampling methods, and one that gives very satisfactory results for estimating a model. In this paper we adopt the parametric bootstrap where in the regression models context it assumes that the probability distribution of the response variable is known and indexed by unknown parameters. [15]. The steps for performing this method both for bias correction of the MLEs and for obtaining the confidence interval are described in the Algorithms (1), (2) and (3).

Algorothm 1: Parametric method
1: Suppose that y = (y 1 , . . ., y n ) > is a random sample such that each y t , t = 1, . . ., n, follows a distribution F supposedly known and indexed by parameter vector θ; 2: From the original sample, obtain theŷ estimate of θ; 3: Generate B bootstrap samples of size n, namely y � b ¼ ðy � 1 ; . . . ; y � n Þ from Once the estimate of the estimator's bias is obtained we can construct the bias-corrected point estimators. Using the steps of the bootstrap method presented in Algorithm (1), a bootstrap estimate of the bias can be obtained bŷ �b , i.e., it is possible to approximate the expected value from the arithmetic mean of the bootstrap estimates of θ. Thus, we can obtain an estimator corrected up to second order by bootstrap [15,25]: This estimator has the same asymptotic properties as the usual MLE and presents Lower bias in small samples [16]. A detailed discussion of the bootstrap second-order bias correction and its relation to the analytic correction can be found in [26].

Interval estimation of the model parameters
A set constructed on the basis of a point estimator in association with a probability that this set contains the true value of the parameter, defines a confidence interval estimator. The general form for approximate confidence intervals (CI) for θ is: P½l 1 � y � l 2 � � 1 À a; 0 < a < 1; where l 1 and l 2 , (l 1 < l 2 ) are the lower and upper bounds of the confidence interval, respectively, and 1 − α is the confidence level which converges to the probability of coverage. We should emphasize that l 1 and l 2 are quantiles of a distribution indexed by the parameter θ.
Whether we assume that this distribution is known, it is possible to construct exact confidence intervals. However, defining the exact analytical distribution of a random variable is typically highly challenging. Fortunately, there are diverse approaches to building approximate confidence intervals. The most widely used is the asymptotic confidence interval, which assumes asymptotic normality of the MLEs. According to (4) for the simplex model in large samples the distribution ofŷ ¼ ðb > ;ĝ > Þ > is approximately normal with mean equal to θ = (β > , γ > ) > and the variance and covariance matrix given by (4). More precisely, we have that atŷ is the k × k matrix of variances and covariances ofb and K g ¼ ðZ > DZÞ À 1 evaluated atŷ is the q × q matrix of variances and covariances ofĝ.
Consider β i and γ j , with i = 1, . . ., k and j = 1, . . ., q, the ith and jth components of the vectors β and γ, respectively. We shall denote K bb i and K gg j as the i-th and j-th components of the main diagonal of the matrices K bb ðŷÞ and K gg ðŷÞ, respectively. Therefore, it follows that with confidence approximately equal to 1 − α for β i and γ j , respectively, where z 1À a 2 is the 1 À a 2 quantile of the standard normal distribution. These intervals based on MLE may require large samples for the coverage to be close to the nominal ones. In small samples, they can have large coverage errors [15,25]. An workaround for reaching improvements to confidence intervals in small samples, without analytical complexities, is the bootstrap method. This approach typically provides confidence intervals that have coverage levels close to the true coverage probability. Here, we shall discuss two strategies bootstrap-based confidence intervals, namely: the percentile and bootstrap t .
The bootstrap t confidence interval, here so-called as 'Boot t ' [16] is a pivotal method to construct confidence intervals that rely on the traditional t-Student confidence interval. This interval is based on the bootstrap estimate of the T distribution, where T is given by epðŷÞ is the standard error ofŷ. The construction of the bootstrap t confidence interval is given by Algorithm 3. . Note that ep = κ(θ), κ known function and b ep �b ¼ kðŷ �b Þ; 3: The α/2 and 1 − α/2 percentiles of T � b are estimate by the valueŝ t �ða=2Þ andt �ð1À a=2Þ , respectively, as follows Thus, the bootstrap t confidence interval is given by The amountst �ða=2Þ andt �ð1À a=2Þ can be obtained as follows: The quantilest �ða=2Þ andt �ð1À a=2Þ are, respectively, the replicas corresponding to the integer parts of B × (α/2) and B × (1 − α/2); 2.1. If B × (α/2) and B × (1 − α/2) are not integers, we can use the following procedure: Assuming 0 < α < 1, is k = [(B + 1)α/2] the largest integer less than or equal to the number (B + 1)α/2. Thus, the quantiles bootstrapt �ða=2Þ andt �ð1À a=2Þ are given, respectively, by the k-th . Therefore, the bootstrap t intervals for the parameters of the simplex nonlinear regression model are given by the following expressions with i = 1, . . ., k and j = 1, . . ., q, the i-th and j-th components of the vectors β and γ. According to [16], the bootstrap t intervals outperform the asymptotic interval displaying empirical coverages closer to the exact nominal levels, but tend not to be accurate in actual practice. Percentile intervals are more accurate, but display less satisfactory coverage performances. An outstanding discussion on bootstrap-based confidence intervals can be found in [27]. In what follows we shall evaluate the finite-sample performances of the confidence intervals introduced in this section.

Numerical results on point estimation
In this section we present the Monte Carlo simulations results, carried out to evaluate the performances of the maximum likelihood estimators of the nonlinear simplex regression model and the bootstrap versions on small samples. In what follows we shall assuming the following nonlinear simplex regression model: where g(�) and h(�) are the logit and logarithmic link functions, respectively. The realizations of the covariates were generated using the uniform distribution as follows: x t2 � Uð0:5; 1:5Þ, x t3 � Uð0; 1Þ, x t4 � UðÀ 0:5; 0:5Þ and z t � Uð0:5; 1:5Þ which are retained fixed for each Monte Carlo replication. Three different scenarios were considered for the mean response, namely: Furthermore, concerning the degree of non-constant dispersion, we report here the results for λ � 12 with The sample sizes chosen were n = 40, 80 and 120. For the last two cases we initially generated n = 40 covariates observations and these were replicated twice and three times, respectively to obtain the sample sizes n = 80 and n = 120. This was done to ensure that the non-constant dispersion intensity was the same for all sample sizes. The number of Monte Carlo and bootstrap replications were R = 10000 and B = 500, respectively. The parameter estimates in (6) were obtained by maximizing the log-likelihood function using the Fisher's nonlinear optimization method. For each both Monte Carlo replicate and the maximum likelihood estimate of the model parameters, B bootstrap replicate estimates were generated. Thus, at the end of the bootstrap some quantities regarding the parameters are estimated, namely: the corrected bootstrap estimates and the bootstrap confidence intervals, percentile and bootstrap t . Finally, outside the bootstrap, the asymptotic intervals of the parameters are also computed based on the quantiles of the standard normal distribution.
Aiming to evaluate the performance of the point estimation of the parameters, the relative bias and the square root of the mean square error were calculated for each sample size. Additionally, we introduce a measure suggested during the review of the article, which we shall socall Unified Quadratic Bias (UQB) define as ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi In Tables 1-3 we consider, respectively, the scenarios where μ t 2 (0.02, 0.32), (μ t � 0), μ t 2 (0.19, 0.86), (μ t � 0.5) and μ t 2 (0.78, 0.98), (μ t � 1), t = 1, . . ., n. In these tables are reported the relative biases and the square roots of the mean square errors (RMSEs) of the parameter estimators for n = 40, 80 and 120 and λ � 12, 45 and 128. We observe that in modulo the estimates of the relative bias of the bootstrap corrected estimators are smaller than those of the maximum likelihood estimators, evidencing the efficacy of the bootstrap scheme in bias correction. For example, the relative bias estimate of the � b 3 (BOOT) estimator is equal to 0.0003, while that of theb 3 (MLE-asymptotic) is 0.001. For μ t 2 (0.19, 0.86), n = 120 and λ � 12, the estimated biased is equal to 0.001 forb 2 and < 0.0001 for � b 2 . In fact, it is noteworthy the high performance of the bootstrap correction when μ t 2 (0.19, 0.86), since its estimators exhibit lower biases than the MLEs-asymptotic for all model parameters, for the different levels of non-constant dispersion and the sample sizes. In all scenarios considered, we note that the RMSEs of the estimators decrease when the sample size increases.
As we had expected the MLEs-asymptotic of the parameters of the dispersion submodel tend to be more biased than those of the mean submodel, especially regarding γ 1 . For instance, for μ t 2 (0.02, 0.32), n = 120 and λ � 45, the relative bias estimate ofĝ 1 is equal to 0.043, while that of the � g 1 is < 0.0001. More expressive are the biases ofĝ 1 andĝ 2 which drop from (0.136, −0.010) to (−0.001,0.001) after the bootstrap correction, respectively, when n = 40 and λ � 45. Forĝ 1 in particular, the bootstrap correction provides a substantial reduction of the estimated bias. This is important since the correct estimation of the of the dispersion submodel parameters, directly interferes with the estimates of the response variances, which, when corrected, produce Z-tests that lead to truer decisions. Even so, the corrections were also effective forb i , i = 1, 2, 3, 4 because the goal is for the bias values to be as close to zero as possible, and for these parameters, through correction, the estimated bias became some times < 0.0001, i.e. the goal was achieved.
It is important to note that the estimated biases of the usual and corrected maximum likelihood estimators are notably smaller when the mean of the response variable is close to the upper limit of the unit interval than for the two other scenarios considered (Tables 1 and 3). Based on the Unified Quadratic Bias measure it becomes more evident how effective the bias correction we propose is. Let shall evaluated the results on μ t 2 (0.78, 0.98) and n = 40, for λ � 12, 45 and 128, we have that the values of the UQB for the original MLEs are equal to 0.132, 0.137 and 0.138, whereas for the corrected version these values became 0.005, 0.001 and 0.002, respectively (Table 3).

Numerical results on confidence intervals
Concerning interval estimation, we computing the empirical coverage of the intervals (%), obtained from the relative frequencies in which the intervals contained the true value of the Table 1 parameter. The lower and upper bounds were also estimated (via the average after the end of the Monte Carlo process), thus we were able to estimate the average length of the intervals and left and right non-coverage rates. The left rate is computed whenever the interval upper limit is less than the true value of the parameter and right rate is computed whenever the interval lower limit is greater than true value of the parameter.

. Relative biases and root mean square errors of the Maximum Likelihood Estimators (MLEs-asymptotic) and bootstrap corrected MLEs of the model parameters:
In what follows we report the results of Monte Carlo simulations on interval estimation. We shall just take the nominal levels 0.90 and 0.95 concerning to Tables 4 and 5, respectively. These tables display the coverage rates of the following competing interval estimators: the asymptotic ML-like or ML interval approximation (ML-I a ), bootstrap t (Boot t ) and percentile (Bootp) for the model parameters in (6).
Regarding coverage rates the interval that performs best is the bootstrap t , with empirical coverage substantially closest to the nominal levels, for all parameters model. The asymptotic confidence interval displayed considerable undercoverage and the percentile confidence interval Bootp overall outperforms the ML-I a , only for γ 1 the Bootp displays a poor performance.   1 − α = 0.90, the ML-I a , Boot t and Bootp coverage rates are around 0.87, 0.95 and 0.74, respectively (Table 5). Now consider n = 120, 1 − α = 0.95, λ = 45, μ t 2 (0.19, 0.86), as to β 3 , those values are equal to 0.933, 0.942 and 0.934 (Table 8). Meaning, even when the size of the sample increases the empirical coverage of the Boot t interval is closest to the nominal level. Our interest hereafter shall lie in evaluate some interval properties only for the nonlinearity parameters of the mean and the dispersion submodels, meaning β 2 and γ 2 . Tables 6 through 9 present the mean lower (Lower) and upper (Upper) bounds, mean lengths (Size), the empirical probability of coverage (Coverage), as well as the left and right coverage rates (%) of the interval estimators of the previously mentioned parameters. These last two quantities evaluate the balancing of the interval. Perfect balancing occurs when these two percentages (%) are identical.
Its empirical probability of coverage is the closest to the true one, however, the its mean length interval is longer than those of the other two intervals, which allows the inclusion of β 2 = 1.0, i.e., linearity (Table 6). This behavior of the Boot t interval holds for n = 80 (Table 7). When λ � 128 all intervals include the possibility of linearity when n = 40, i.e., β 2 = 1.0 (Table 6), for all confidence nominal levels. This result is interesting, as it shows how the Table 3  Our spotlight hereafter is show how the Boot t considerably outperforms the accuracy and balance of its competitors, concerning for β 2 interval. We shall fix 1 − α = 0.95 and consider n = 40 and three λ's values. We shall compose the following set consisting of: the empirical coverage and left and right non-coverages rates of the interval estimators, expressed as {(�), [ Table 9 present the simulation results for β 2 (μ t � 0.5 and μ t � 1), when n = 40, λ � 128 and β 2 equal to −1.8 and −1.5, respectively. We note that for the three nominal levels and the different scenarios, the asymptotic type confidence interval has the shortest average length. We also note that the bootstrap t confidence interval presents the best empirical coverage and balance proprieties, followed by the Bootp interval which had very similar values to the ML-I a interval. Figs 1 and 2 contain histograms constructed from the 10000 maximum likelihood estimates of the parameter β 2 and γ 2 , respectively, for n = 40, λ � 128 and the different scenarios for μ t , t = 1, . . ., n. The distinct lines represent the different confidence intervals under evaluation, and their lengths correspond to the respective average lengths. The values below and above of the vertical lines are the non-coverage rates, meaning the percentages of replicates in which PLOS ONE Table 5. Coverage rates of the interval estimators: ML-I a , Boot t and Bootp for θ, the model parameters: the true value of the parameter was smaller than the lower limit of the interval (below) and larger than the upper limit of the range (above). These graphics were designed according to [28]. Through them it is possible to verify that for the different μ t scenarios, the analyzed intervals are approximately symmetrical around the true value of β 2 . We further note that for μ t 2 (0.02, 0.32), the intervals were better balanced when compared to the scenarios where μ t 2 (0.19, 0.86) and μ t 2 (0.78, 0.98). Overall, the bootstrap t confidence interval stands out as better balanced. Fig 2 show that only the asymptotic confidence interval is approximately symmetric around the true value's γ 2 .

. Relative biases and root mean square errors of the Maximum Likelihood Estimators (MLEs-asymptotic) and bootstrap corrected MLEs of the model parameters:Logitðμ
The bootstrap t confidence interval is slightly asymmetric around γ 2 . Regarding the bootstrap percentile confidence interval it exhibits very strong asymmetry, especially for the nominal 99% level. We also observe that the asymptotic confidence interval exhibits strong unbalancing, as the rates (% Right) are markedly higher than the observed rates (% Left). However, the bootstrap t and percentile confidence intervals are approximately balanced for all nominal levels and scenarios. Therefore, based on the results presented, we suggest using the bootstrap t confidence interval which showed better coverage and balance performances.

Application: Fluid Catalytic Cracking Data (FCC)
In this application the data are from the Chemistry Department of the National University of Colombia [29] and concerns a process regarding the volume and quality of gasoline produced in a refinery. The fluid catalytic cracking process known as Fluid Catalytic Cracking (FCC) is used to convert high molecular weight hydrocarbons into small molecules of higher commercial value by contacting them with a catalyst. This process is often described as the heart of the refinery, as it allows production to be tailored for a higher demand and especially high profit products [29]. The process catalyst consists of fine particles of 10 to 150 microns, easily fluidizable having the zeolite Y [29] as the main component. Another important substance that participates in the catalysis process is the vanadium. This chemical component is known to participate in catalyst destruction, reducing the active surface, selectivity and crystallinity of the zeolite Y especially in the presence of steam. Every 1000 ppm of vanadium in the catalyst is known to reduce gasoline yield by about 2.3%. The process also depends on the temperature, which must be close to 720˚C [29]. The data set consists of 28 observations. Aiming to fit a model to these data [13] chose the following candidate to covariates: steam (x 2 ), temperature (x 3 ) and vanadium concentration (x 4 ). Moreover, the authors defined a linear predictor relating these covariates to unknown parameters. However, the residual analysis highlighted the possibility that the predictor is non-linear in some of the parameters. To build the nonlinear model, the authors follow several steps that are carefully detailed in their article. The model chosen uses probit and logarithmic link functions for the mean and dispersion submodels, respectively, and was defined as follows: ffi ffi ffi ffi ffi x t4 p and logðs 2 t Þ ¼ g 1 þ g 2 x 2 t4 ;, with t = 1, . . ., 28. Hereafter we shall so-called this model as 'Model-I'. We emphasize that this simplex model outperformed a competing beta model [13].  In Table 11 are reported the interval estimates of the Model-I parameters assuming nominal levels equal to 90%, 95% and 99%. Mindful the three estimation methods, ML-I a , Boot t and Bootp it is notice that the interval estimates for β 1 , β 2 , β 4 and β 5 are quite similar. Whereas for the β 3 parameter the bootstrap t scheme estimates display lengths substantially longer than that of its competitors, for all nominal levels. Exemplify, for 1 − α = 0.95, the ML-I a , Boot t and Bootp interval estimates are, respectively, (−35.445; −20.240), (−40.929; −9.544) and (−34.745; −20.474). Another feature concerning the bootstrap t interval estimator is that some of its estimates include the value zero for the parameters. This fact occurs for the β 2 , (99%), (−0.466;0.142) and for the β 4 , (95%) and (99%). Nevertheless, β 2 = 0 implies both in the exclusion of steam, which is a covariate recognized as important to the process and in the assumption of a linear predictor for the mean submodel. The most important information that the Table 8. Lower and upper bounds, size, empirical coverage (Coverage) and percentages of lower (%Left) and upper (%Right) non-coverage of the ML-I a , Boot t and Bootp intervals for β 2 , in the model: Logitðμ t =1 À μ t Þ ¼ β 1 þ x β 2 t2 þ β 3 x t3 þ β 4 x t4 and logðσ 2 t Þ ¼ γ 1 þ z γ 2 t , t = 1, . . ., n, μ t 2 (0.02, 0.32), β 2 = 1.2, n = 120. figures in Table 11 reveal, though, is that only the bootstrap t interval considers the possibility that both γ 1 and γ 2 are simultaneously at equal to zero, both to 95% and 99%. Bootp reaches this conclusion for the 99% level, whereas for ML-Ia estimator it is only possible that γ 1 = 0 and when 1 − α = 0.99. Therefore, we shall evaluate a nonlinear simplex model with constant dispersion. Among the competing models, the one that presented the best goodness-of-fit uses log-log complementary and logarithmic link functions for the mean and dispersion submodel, as we shall describe in the following: logðÀ logð1:0 À m t ÞÞ ¼ ffi ffi ffi ffi ffi x t4 p and  Table 12 report the interval estimates of the 'Model-II' model parameters assuming nominal levels equal to 90%, 95% and 99%. It is noteworthy that the correct dispersion modeling improves the interval estimators performances. The accuracy of the boot t and ML-Ia interval estimators regarding to the β 2 parameter is Table 9. Lower and upper bounds, size, empirical coverage (Coverage) and percentages of lower (%Left) and upper (%Right) non-coverage of the ML-I a , Boot t and Bootp intervals. For β 2 in the model: Logitðμ t =1 À μ t Þ ¼ β 1 þ x β 2 t2 þ β 3 x t3 þ β 4 x t4 and logðσ 2 t Þ ¼ γ 1 þ z γ 2 t , t = 1, . . ., n, n = 40, λ � 128.  Here, we note that the Bootp displays a poor performance. It should be reminded that β 2 is the parameter associated with the nonlinearity of the model, as well as β 3 . In fact, after dispersion was assumed constant the Boot t scheme provided intervals for β 3 with considerably shorter lengths compared to those of the 'Model-I' model. One area of research that we have been working on intensively regards model selection criteria for nonlinear models. The R 2 FC criterion proposed by [7] for the beta regression model, defined as the square of the correlation between g(y) andẐ 1 has proven quite effective in assessing the goodness-of-fit of models to data in the different applications we have performed on nonlinear models. The R 2 FC c corrected was proposed by [30] and is defined as R 2 FC c ¼ 1 À ð1 À R 2 FC Þðn À 1Þ=ðn À ðk 1 þ q 1 ÞÞ. Models I and II display fR 2 FC ; R 2 FC c g measures equal to {0.6506, 0.5508} and {0.6818, 0.6095}, respectively. Thus, the choice of Model II is adequate and this model was inferred based on the bootstrap t interval estimator.

Conclusion
In this paper we evaluate the point and interval estimation for the parameters indexing the nonlinear simplex regression model [13] in small samples. Additionally, we propose inferential improvements based on the bootstrap method.
Often MLEs can be biased when the sample size is small or even moderate. Thus, we consider comparing the point MLE performances of the model parameters and their corrected versions through a bootstrap scheme. The results of Monte Carlo simulations showed that, in general, the corrected estimators presented lower biases than the maximum likelihood estimators, evidencing the efficacy of the bootstrap scheme in bias correction. The MLEs of the parameters of the dispersion submodel are strongly biased, and the bootstrap corrected estimator provides a substantial reduction of these bias. Thus reinforcing the importance of using the proposed scheme in the bias correction of estimators of the nonlinear simplex regression model.
Usually the asymptotic confidence intervals based on MLE's require large samples in order the coverage rates to be close to the nominal ones. An alternative to constructing adequate confidence intervals on small samples is through the bootstrap method. Thus, we consider three competing interval estimators, namely: the MLE-asymptotic, percentile and bootstrap t estimator intervals. Regarding coverage rate in every simulation's scenarios the bootstrap t confidence interval outperformed the two others competitors. Furthermore, in almost all experiments it was the best balanced interval.
As a penalty for providing this outperformance, the bootstrap t is typically larger than that of its competitors. Overall, however, the bootstapt interval proved to be the most appropriate estimator interval for nonlinear simplex regression. Not only from the simulation results, but it was also decisive in the application. In a scenario with only n = 28 observations, it was able to point the misspecification of the dispersion model which yield to a new and best fitted model. Writing -original draft: Alisson de Oliveira Silva.